Recognizing handwriting images is challenging due to the vast variation in writing style across many people and distinct linguistic aspects of writing languages. In Vietnamese, besides the modern Latin characters, there are accent and letter marks together with characters that draw confusion to state-of-the-art handwriting recognition methods. Moreover, as a low-resource language, there are not many datasets for researching handwriting recognition in Vietnamese, which makes handwriting recognition in this language have a barrier for researchers to approach. Recent works evaluated offline handwriting recognition methods in Vietnamese using images from an online handwriting dataset constructed by connecting pen stroke coordinates without further processing. This approach obviously can not measure the ability of recognition methods effectively, as it is trivial and may be lack of features that are essential in offline handwriting images. Therefore, in this paper, we propose the Transferring method to construct a handwriting image dataset that associates crucial natural attributes required for offline handwriting images. Using our method, we provide a first high-quality synthetic dataset which is complex and natural for efficiently evaluating handwriting recognition methods. In addition, we conduct experiments with various state-of-the-art methods to figure out the challenge to reach the solution for handwriting recognition in Vietnamese.
translated by 谷歌翻译
本文报道的研究通过应用计算机视觉技术将普通的垃圾桶转化为更聪明的垃圾箱。在传感器和执行器设备的支持下,垃圾桶可以自动对垃圾进行分类。特别是,垃圾箱上的摄像头拍摄垃圾的照片,然后进行中央处理单元分析,并决定将垃圾桶放入哪个垃圾箱中。我们的垃圾箱系统的准确性达到90%。此外,我们的模型已连接到Internet,以更新垃圾箱状态以进行进一步管理。开发了用于管理垃圾箱的移动应用程序。
translated by 谷歌翻译
在计算语言学和自然语言处理的应用方面,中文字分割和语音标记是必要的任务。许多重新搜索者仍然辩论对深度学习时代中汉语词组和演讲的一部分。尽管如此,解决歧义并检测到未知词是挑战这一领域的问题。以前关于联合中文分割和语音标记的研究主要遵循关注的基于角色的标记模型,专注于模拟n-gram功能。与以前的作品不同,我们提出了一个名为SpanseGtag的神经模型,用于联合中文字分割和跨度标记之后的语音标记,其中每个n克是单词和词语标签的概率是主要的问题。我们在连续字符的左边和右边界表示的左边和右边界表示中使用双重边界表示来模拟n-gram。我们的实验表明,我们的BERT基模型SPANSEGTAG在CTB5,CTB6和UD上实现了竞争性能,或者使用BERT或ZEN编码器的当前最先进的方法对CTB7和CTB9基准数据集进行了显着改进。
translated by 谷歌翻译
可解释的机器学习旨在了解复杂的黑盒系统的推理过程,这些系统因缺乏解释性而臭名昭著。一种不断增长的解释方法是通过反事实解释,这超出了为什么系统做出一定决定,以进一步提供有关用户可以采取哪些方法来改变结果的建议。反事实示例必须能够应对黑框分类器的原始预测,同时还满足实用应用程序的各种约束。这些限制存在于一个和另一个之间的权衡处,对现有作品提出了根本的挑战。为此,我们提出了一个基于随机学习的框架,可以有效地平衡反事实权衡。该框架由具有互补角色的一代和特征选择模块组成:前者的目标是建模有效的反事实的分布,而后者则以允许可区分训练和摊销优化的方式执行其他约束。我们证明了我们方法在产生可行和合理的反事实中的有效性,这些反事实比现有方法更多样化,尤其是比具有相同能力的对应物更有效的方式。
translated by 谷歌翻译
文本分类是具有各种有趣应用程序的典型自然语言处理或计算语言学任务。随着社交媒体平台上的用户数量的增加,数据加速促进了有关社交媒体文本分类(SMTC)或社交媒体文本挖掘的新兴研究。与英语相比,越南人是低资源语言之一,仍然没有集中精力并彻底利用。受胶水成功的启发,我们介绍了社交媒体文本分类评估(SMTCE)基准,作为各种SMTC任务的数据集和模型的集合。借助拟议的基准,我们实施和分析了各种基于BERT的模型(Mbert,XLM-R和Distilmbert)和基于单语的BERT模型(Phobert,Vibert,Vibert,Velectra和Vibert4news)的有效性SMTCE基准。单语模型优于多语言模型,并实现所有文本分类任务的最新结果。它提供了基于基准的多语言和单语言模型的客观评估,该模型将使越南语言中有关贝尔特兰的未来研究有利。
translated by 谷歌翻译
可解释的机器学习提供了有关哪些因素推动了黑盒系统的一定预测以及是否信任高风险决策或大规模部署的洞察力。现有方法主要集中于选择解释性输入功能,这些功能遵循本地添加剂或实例方法。加性模型使用启发式采样扰动来依次学习实例特定解释器。因此,该过程效率低下,并且容易受到条件较差的样品的影响。同时,实例技术直接学习本地采样分布,并可以从其他输入中利用全球信息。但是,由于严格依赖预定义的功能,他们只能解释单一级预测并在不同设置上遇到不一致的情况。这项工作利用了这两种方法的优势,并提出了一个全球框架,用于同时学习多个目标类别的本地解释。我们还提出了一种自适应推理策略,以确定特定实例的最佳功能数量。我们的模型解释器极大地超过了忠诚的添加和实例的对应物,而在各种数据集和Black-box模型体系结构上获得了高水平的简洁性。
translated by 谷歌翻译
近年来,问题回答(QA)系统引起了爆炸性的关注。但是,越南语中的质量检查任务没有很多数据集。值得注意的是,医疗域中大多没有数据集。因此,我们为回答数据集(VIHealthQA)建立了一个越南医疗保健问题,其中包括10,015个问题 - 答案段落对,以实现这项任务,其中在享有盛名的健康网站上问了来自健康利益的用户的问题,并在享有资格的专家中得到了答案。本文提出了一个基于句子 - 伯特(Sbert)的两阶段质量检查系统,使用多个负损失(MNR)损失与BM25结合在一起。然后,我们使用许多单词范围的模型进行多种实验,以评估系统的性能。通过获得的结果,该系统的性能比传统方法更好。
translated by 谷歌翻译
问题回答(QA)是信息检索和信息提取领域内的一项自然理解任务,由于基于机器阅读理解的模型的强劲发展,近年来,近年来,近年来的计算语言学和人工智能研究社区引起了很多关注。基于读者的质量检查系统是一种高级搜索引擎,可以使用机器阅读理解(MRC)技术在开放域或特定领域特定文本中找到正确的查询或问题的答案。 MRC和QA系统中的数据资源和机器学习方法的大多数进步尤其是在两种资源丰富的语言中显着开发的,例如英语和中文。像越南人这样的低资源语言见证了关于质量检查系统的稀缺研究。本文介绍了XLMRQA,这是第一个在基于Wikipedia的文本知识源(使用UIT-Viquad语料库)上使用基于变压器的读取器的越南质量检查系统,使用深​​层神经网络模型优于DRQA和BERTSERINI,优于两个可靠的QA系统分别为24.46%和6.28%。从三个系统获得的结果中,我们分析了问题类型对质量检查系统性能的影响。
translated by 谷歌翻译
We propose a new causal inference framework to learn causal effects from multiple, decentralized data sources in a federated setting. We introduce an adaptive transfer algorithm that learns the similarities among the data sources by utilizing Random Fourier Features to disentangle the loss function into multiple components, each of which is associated with a data source. The data sources may have different distributions; the causal effects are independently and systematically incorporated. The proposed method estimates the similarities among the sources through transfer coefficients, and hence requiring no prior information about the similarity measures. The heterogeneous causal effects can be estimated with no sharing of the raw training data among the sources, thus minimizing the risk of privacy leak. We also provide minimax lower bounds to assess the quality of the parameters learned from the disparate sources. The proposed method is empirically shown to outperform the baselines on decentralized data sources with dissimilar distributions.
translated by 谷歌翻译
Video understanding is a growing field and a subject of intense research, which includes many interesting tasks to understanding both spatial and temporal information, e.g., action detection, action recognition, video captioning, video retrieval. One of the most challenging problems in video understanding is dealing with feature extraction, i.e. extract contextual visual representation from given untrimmed video due to the long and complicated temporal structure of unconstrained videos. Different from existing approaches, which apply a pre-trained backbone network as a black-box to extract visual representation, our approach aims to extract the most contextual information with an explainable mechanism. As we observed, humans typically perceive a video through the interactions between three main factors, i.e., the actors, the relevant objects, and the surrounding environment. Therefore, it is very crucial to design a contextual explainable video representation extraction that can capture each of such factors and model the relationships between them. In this paper, we discuss approaches, that incorporate the human perception process into modeling actors, objects, and the environment. We choose video paragraph captioning and temporal action detection to illustrate the effectiveness of human perception based-contextual representation in video understanding. Source code is publicly available at https://github.com/UARK-AICV/Video_Representation.
translated by 谷歌翻译